Deskewing global clock skew using localized DLLs

ABSTRACT

A method for reducing global clock skew by referencing a first point on an integrated circuit to which to align other points on the integrated circuit is provided. Further, an integrated circuit that has localized DLLs having adjustable buffers that selectively drive a signal on a clock grid is provided. Further, a technique for using a local DLL, one or more phase detectors, and one or more DLLs connected to portions of a clock grid to reduce clock skew is provided.

BACKGROUND OF INVENTION

[0001] A typical computer system includes at least a microprocessor andsome form of memory. The microprocessor has, among other components,arithmetic, logic, and control circuitry that interpret and executeinstructions necessary for the operation and use of the computer system.FIG. 1 shows a typical computer system (10) having a microprocessor(12), memory (14), integrated circuits (16) that have variousfunctionalities, and communication paths (18), i.e., buses and wires,that are necessary for the transfer of data among the aforementionedcomponents of the computer system (10).

[0002] The components of a computer system use a reference of time toperform the various operations of the computer system. This reference oftime is provided to the components of the computer system using one ormore clock signals. The components use the one or more clock signals todetermine when to conduct certain operations. As computer systemscontinue to operate at ever-increasing frequencies, it becomes more andmore important to ensure that the components of a computer systemreceive their clock signals in an accurate and timely manner because amistiming has the potential to cause an error, performance setback, oran outright malfunction of the computer system.

[0003]FIG. 2 shows a clock distribution network (20) for amicroprocessor (12). A reference clock (also known in the art as “systemclock” and shown in FIG. 2 as REF_CLK), which is typically generatedfrom outside the microprocessor (12), serves as an input to a phaselocked loop (“PLL”) (15). Essentially, the PLL (15) uses feedback tomaintain a specific phase relationship between its output (shown in FIG.2 as CHIP_CLK) and the reference signal. The chip clock from the PLL(15) is then distributed to one or more clock drivers/buffers (17),which, in turn, distribute the chip clock to a global clock grid (19),where the global clock grid (19) feeds the chip clock to variousmicroprocessor components such as local clock grids (24) and a feedbackloop (26) that feeds the chip clock back to the PLL (14). The localclock grids (24) feed the chip clock to base components of themicroprocessor (12), such as latches (22) and flip-flops (28).

[0004] As a clock signal, such as the chip clock shown in FIG. 2, ispropagated to the various parts and components of a microprocessor, oneor more types of system variations may alter the behavior and/orintegrity of the clock signal. Common system variations include, but arenot limited to, power variations, temperature variations, and processvariations. Due to these and other variations across a microprocessor, aparticular clock signal may arrive at different parts of themicroprocessor at different times. This difference in the arrival of aclock signal at different system components is referred to and known inthe art as “clock skew.”

[0005] As partly discussed above, clock skew is a function ofarchitectural factors such as load, device distribution across amicroprocessor, device mismatch, and temperature and voltage gradientsacross the microprocessor. By designing a microprocessor that accountsfor some of these variations, the amount of clock skew in themicroprocessor may be reduced. The process of removing or decreasingclock skew is referred to and known in the art as “deskewing.”

[0006] Clock deskewing is typically performed in an upper distributionlayer of a clock distribution network. For example, in a clockdistribution network that has a global and a local layer, deskewing isperformed in the global layer. Similarly, in a network that has aglobal, a regional, and a local layer, deskewing is performed in theglobal and/or regional layers.

[0007]FIG. 3 shows a typical clock distribution network (40) having aglobal distribution layer (42), a regional distribution layer (44), alocal distribution layer (46), where clock deskewing occurs in theregional distribution layer (44). In FIG. 3, a PLL (48) distributes achip clock to a set of one or more clock drivers/buffers (50), which, inturn output the chip clock to a set of deskewing buffers (52) in theregional distribution layer (44). The deskewing buffers (52) deskew thechip clock and then distribute the deskewed chip clock to a global clockgrid (54), which is connected to one or more local clock grids (58),where the local clock grids (58) are connected to microprocessorcomponents such as latches (56), flip-flops (60), and other types ofcircuit elements (not shown).

[0008] A deskewing buffers, as shown in FIG. 3 are typically implementedas a delay locked loop (DLL). A DLL is a component that uses a controlsignal to maintain an output signal in a specific delay relationshipwith an input signal. The control signal indicates to the DLL how muchdelay, if any, the DLL needs to insert into the output signal. Becausethe amount of delay a DLL inserts is typically not a constant orpredefined value, the DLL is known as a “variable delay circuit.”

[0009] As shown in FIG. 3, when deskewing buffers are included in theregional distribution layer of a clock distribution network, adjustingglobal clock skew is a less onerous task. However, such deskewing doesnot account for clock skew contributed by devices and variations in thelocal distribution layer (such clock skew is referred to and known inthe art as “localized clock skew”).

SUMMARY OF INVENTION

[0010] According to one aspect of the present invention, an integratedcircuit comprises a local delay locked loop, an adjustable delay lockedloop comprising a tunable buffer that is connected to a clock grid, anda phase detector that indicates to the adjustable delay locked loopwhether the tunable buffer needs to be modulated based on a referenceclock and a feedback clock, where the reference clock is operativelyconnected to the local delay locked loop, and where the feedback clockis operatively connected to the clock grid.

[0011] According to another aspect, a method for reducing clock skew ona clock grid comprises inputting a reference clock operatively connectedto a local delay locked loop, where the local delay locked loop residesat a first location, inputting a feedback clock operatively connected tothe clock grid, determining whether the feedback clock needs to bemodulated based on the reference clock, and selectively adjusting adelay of a buffer connected to the clock grid dependent on thedetermination.

[0012] According to another aspect, a method for decreasing clock skewcomprises referencing a point on a clock grid to which to align otherpoints on the clock grid by varying at least one delay of at least onetunable buffer connected to the clock grid.

[0013] According to another aspect, an integrated circuit having a clockgrid comprises referencing means for referencing a first point on theclock grid to align at least one other point.

[0014] Other aspects and advantages of the invention will be apparentfrom the following description and the appended claims.

BRIEF DESCRIPTION OF DRAWINGS

[0015]FIG. 1 shows a typical computer system.

[0016]FIG. 2 shows a typical clock distribution network.

[0017]FIG. 3 shows a typical clock distribution network.

[0018]FIG. 4a shows a component layout in accordance with an embodimentof the present invention.

[0019]FIG. 4b shows a component layout in accordance with the embodimentshown in FIG. 4a.

[0020]FIG. 5 shows a design in accordance with an embodiment of thepresent invention.

DETAILED DESCRIPTION

[0021] Embodiments of the present invention relate to a method forreducing global clock skew in an integrated circuit. Embodiments of thepresent invention further relate to a circuit device implementation thatreduces global clock skew. Embodiments of the present invention furtherrelate to a technique for increasing computer system performance byreducing system uncertainties associated with clock skew in the computersystem.

[0022] Particularly, the present invention relates a technique forreferencing a point on a clock grid for which to align other points onthe clock grid by varying one or more delays of clock buffers associatedwith the clock grid. This technique helps decrease clock skew byreducing systematic components of clock skew in an integrated circuit.

[0023]FIG. 4a shows an exemplary component layout of an integratedcircuit (80) in accordance with an embodiment of the present invention.A local DLL (82) of the integrated circuit (80) outputs to one or moreadjustable DLLs (84) (shown in FIG. 4a as DLL′). This output from thelocal DLL (82) serves a feedback clock to a PLL (not shown). Theadjustable DLLs (84) use the output of the local DLL (82) as a referenceto lock to. Specifically with reference to FIG. 4a, the adjustable DLLs(84) in the top and bottom rows lock to the adjustable DLLs (84) in themiddle row and the adjustable DLLs (84) in the middle row lock to thelocal DLL (82). Positioned in between a pair of adjustable DLLs (84) isa phase detector (86), where the phase detector (86) outputs to anassociated adjustable DLL (84). Delay elements within the adjustableDLLs (84) are replicated within a region of a global clock grid (notshown). Those skilled in the art will appreciate that such replicationmay result in replacing a portion of a global clock grid driver.

[0024]FIG. 4b shows a detailed structure of a section (87) of theintegrated circuit (80) in accordance with the embodiment shown in FIG.4b. The phase detector (86) receives both a reference clock (shown inFIG. 4b as REF_CLK) from an adjacent adjustable DLL (not shown in FIG.4b) and a feedback clock (shown in FIG. 4b as FEEDBACK) from theadjustable DLL (84) to which it is associated. The phase detector (86)determines whether a time phase of the feedback clock is aligned withthe time phase of the reference clock. When the time phases of thefeedback clock and the reference clock are not aligned, the phasedetector (86) indicates as such on an up/down signal (shown in FIG. 4bas UP/DOWN) to a finite state machine (88) (shown in FIG. 4b as FSM)inside the adjustable DLL (84). The up/down signal generated by thephase detector (86) indicates whether the feedback clock needs to besped up (by an “up” indication) or whether the feedback clock needs tobe slowed down (by a “down” indication). Those skilled in the art willappreciate that the reference clock and feedback clock into the phasedetector (86) may be implemented in one or metal layers of an integratedcircuit in order to decrease the effect of transistor and voltagevariations.

[0025] The finite state machine (88) uses a counter function that countsthe number of times the up/down signal is up and the number of times theup/down signal is down. Using these counts, the finite state machine(88) generates control bits to a multi-bit control bus (89) that is usedto module the delay of one or more tunable buffers (90).

[0026] The tunable buffers (90) in the adjustable DLL (84) areessentially delay elements that are replicated and distributed across aglobal clock grid (94). The adjustable buffers (90) are interleavedbetween regular global clock grid buffers (92). With a proper controlsetting into the tunable buffers (90), the delay of the global clockgrid (94) may be modulated. For example, as a default control setting, atunable buffer (90) has the same drive strength as a regular globalclock grid buffer (92). As the control changes, the resulting delay ofthe tunable buffers (90) is interpolated with that of the regular globalclock grid buffers (92) over the global clock grid (94), therebychanging an overall delay of the global clock grid (94).

[0027]FIG. 5 shows an exemplary design of a tunable buffer (90) inaccordance with an embodiment of the present invention. The tunablebuffer (90) is implemented using parallel transistor stacks (94). Thegate of each transistor in a transistor stack in is connected to aparticular control bit on the control bus (89). When a control bit isasserted, the corresponding transistor stack contributes to an outputcurrent of the tunable buffer (90), thereby increasing the tunablebuffer's drive strength. Conversely, when the control bit is deasserted,the corresponding transistor stack is tri-stated, thereby reducing thetunable buffer's drive strength.

[0028] Those skilled in the art will appreciate that the number ofcontrol bits used to control a particular tunable buffer may be adjusteddepending on the amount of skew that one wishes to adjust for and/or theamount of delay resolution one wishes to observe or have. It followsthat using more control bits allows for a larger range of compensationand finer time step. Further, those skilled in the art will appreciatethat the number of transistor stacks used in a tunable buffer may alsobe changed according to the amount of control bits one wishes to use.

[0029] Advantages of the present invention may include one or more ofthe following. In some embodiments, because a point on a clock grid isreferenced to align other points on the clock grid to, clock skew isreduced.

[0030] In some embodiments, because localized adjustable DLLs are usedto reduce global clock skew, clock skew introduced at the localdistribution layer may be reduced.

[0031] In some embodiments, because clock skew is decreased by usinglocalized DLLs having tunable buffers that are connected to a globalclock grid, a skew budget for a clock distribution network may bedecreased.

[0032] In some embodiments, because clock skew is reduced, systemperformance is increased.

[0033] While the invention has been described with respect to a limitednumber of embodiments, those skilled in the art, having benefit of thisdisclosure, will appreciate that other embodiments can be devised whichdo not depart from the scope of the invention as disclosed herein.Accordingly, the scope of the invention should be limited only by theattached claims.

What is claimed is:
 1. An integrated circuit, comprising: a local delaylocked loop; an adjustable delay locked loop comprising a tunable bufferthat is connected to a clock grid; and a phase detector that indicatesto the adjustable delay locked loop whether the tunable buffer needs tobe modulated based on a reference clock and a feedback clock, whereinthe reference clock is operatively connected to the local delay lockedloop, and wherein the feedback clock is operatively connected to theclock grid.
 2. The integrated circuit of claim 1, wherein the localdelay locked loop outputs to a phase locked loop.
 3. The integratedcircuit of claim 1, wherein the tunable buffer is interleaved between aregular clock grid buffer and at least one other regular clock gridbuffer.
 4. The integrated circuit of claim 1, wherein the adjustabledelay locked loop is disposed at a first location on the integratedcircuit different from a second location where the local delayed lock isdisposed, and wherein the adjustable delay locked loop aligns a signalat the first location with a signal at the second location by varying adelay of the tunable buffer.
 5. The integrated circuit of claim 1,wherein the adjustable delay locked loop outputs the reference clock toanother adjustable delay locked loop.
 6. The integrated circuit of claim1, the adjustable delay locked loop comprises a finite state machinethat inputs a signal from the phase detector and outputs at least onecontrol bit to the tunable buffer dependent on the signal from the phasedetector.
 7. The integrated circuit of claim 6, wherein an output fromthe finite state machine is a control bus.
 8. The integrated circuit ofclaim 7, wherein the tunable buffer comprises: a transistor stack,wherein the transistor stack contributes current to an output of thetunable buffer dependent on the control bus.
 9. The integrated circuitof claim 7, wherein the transistor stack decreases current to an outputof the tunable buffer dependent on the control bus.
 10. The integratedcircuit of claim 8, wherein the transistor stack comprises a transistorhaving an input operatively connected to a control bit on the controlbus.
 11. A method for reducing clock skew on a clock grid, comprising:inputting a reference clock operatively connected to a local delaylocked loop, wherein the local delay locked loop resides at a firstlocation; inputting a feedback clock operatively connected to the clockgrid; determining whether the feedback clock needs to be modulated basedon the reference clock; and selectively adjusting a delay of a bufferconnected to the clock grid dependent on the determination.
 12. Themethod of claim 11, further comprising: generating pulses on an up/downsignal to a finite state machine dependent on the determination;generating at least one control bit to the buffer in response togenerating pulses on the up/down signal; and driving a signal on theclock grid based on the at least one control bit.
 13. The method ofclaim 11, further comprising: generating the reference clock to anadjustable delay locked loop.
 14. An integrated circuit having a clockgrid, comprising: referencing means for referencing a first point on theclock grid to align at least one other point.
 15. The integrated circuitof claim 14, the referencing means comprising: adjusting means forvarying a driver connected to the clock grid.
 16. A method fordecreasing clock skew, comprising: referencing a point on a clock gridto which to align other points on the clock grid by varying at least onedelay of at least one tunable buffer connected to the clock grid.